7-1. 합성곱신경망(CNN 장점, CNN 핵심 레이어)

Author

이상민

Published

May 1, 2025

1. imports

import torch
import torchvision
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (4.5, 3.0)

2. CNN 장점

A. 성능이 좋음

train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=False)
test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=False)
train_dataset = torch.utils.data.Subset(train_dataset, range(5000))
test_dataset = torch.utils.data.Subset(test_dataset, range(1000))
to_tensor = torchvision.transforms.ToTensor()
X = torch.stack([to_tensor(img) for img, lbl in train_dataset]).to("cuda:0")
y = torch.tensor([lbl for img, lbl in train_dataset])
y = torch.nn.functional.one_hot(y).float().to("cuda:0")
XX = torch.stack([to_tensor(img) for img, lbl in test_dataset]).to("cuda:0")
yy = torch.tensor([lbl for img, lbl in test_dataset])
yy = torch.nn.functional.one_hot(yy).float().to("cuda:0")

- 발악수준으로 설계한 신경망

torch.manual_seed(0)
net = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784,2048),
    torch.nn.ReLU(),
    torch.nn.Linear(2048,10)
).to("cuda")
loss_fn = torch.nn.CrossEntropyLoss()
optimizr = torch.optim.Adam(net.parameters())
for epoc in range(1,500):
    #1
    logits = net(X)
    #2
    loss = loss_fn(logits, y) 
    #3
    loss.backward()
    #4 
    optimizr.step()
    optimizr.zero_grad()

- 과적합의 끝판왕

(net(X).argmax(axis=1) == y.argmax(axis=1)).float().mean()
tensor(1., device='cuda:0')
(net(XX).argmax(axis=1) == yy.argmax(axis=1)).float().mean()
tensor(0.8530, device='cuda:0')

- 대충대충 설계한 합성곱신경망

torch.manual_seed(0)
net = torch.nn.Sequential(
    torch.nn.Conv2d(1,16,2),
    torch.nn.ReLU(),
    torch.nn.MaxPool2d(2),
    torch.nn.Flatten(),
    torch.nn.Linear(2704,10),
).to("cuda")
loss_fn = torch.nn.CrossEntropyLoss()
optimizr = torch.optim.Adam(net.parameters())
for epoc in range(1,500):
    #1
    logits = net(X)
    #2
    loss = loss_fn(logits, y) 
    #3
    loss.backward()
    #4 
    optimizr.step()
    optimizr.zero_grad()
(net(X).argmax(axis=1) == y.argmax(axis=1)).float().mean()
tensor(0.9666, device='cuda:0')
(net(XX).argmax(axis=1) == yy.argmax(axis=1)).float().mean()
tensor(0.8710, device='cuda:0')

B. 파라메터가 적음

net1 = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784,2048),
    torch.nn.ReLU(),
    torch.nn.Linear(2048,10)
)
net2 = torch.nn.Sequential(
    torch.nn.Conv2d(1,16,2),
    torch.nn.ReLU(),
    torch.nn.MaxPool2d(2),
    torch.nn.Flatten(),
    torch.nn.Linear(2704,10),
)
net1_params = list(net1.parameters())
print(net1_params[0].shape)
print(net1_params[1].shape)
print(net1_params[2].shape)
print(net1_params[3].shape)
torch.Size([2048, 784])
torch.Size([2048])
torch.Size([10, 2048])
torch.Size([10])
2048*784 + 2048 + 2048*10 +10
1628170
net2_params = list(net2.parameters())
print(net2_params[0].shape)
print(net2_params[1].shape)
print(net2_params[2].shape)
print(net2_params[3].shape)
torch.Size([16, 1, 2, 2])
torch.Size([16])
torch.Size([10, 2704])
torch.Size([10])
16*1*2*2 + 16 + 10*2704 + 10 
27130

- net1 의 1.6퍼밖에 안됨..

27130/1628170
0.01666287918337766

C. 유명함

- 딥러닝이 있게함

3. CNN 핵심 레이어

A. torch.nn.ReLU

- (예시1) 연산방법 : 음수를 0으로

img = torch.randn(1,1,4,4) # (4,4) 흑백이미지 한장
relu = torch.nn.ReLU()
img
tensor([[[[ 1.4381,  0.2449, -0.6420,  2.6874],
          [ 0.7790,  1.0558,  0.7939,  0.1099],
          [ 0.3492,  1.7610,  1.6032,  2.4212],
          [ 0.5416, -0.2153, -1.2772,  0.6885]]]])
relu(img)
tensor([[[[1.4381, 0.2449, 0.0000, 2.6874],
          [0.7790, 1.0558, 0.7939, 0.1099],
          [0.3492, 1.7610, 1.6032, 2.4212],
          [0.5416, 0.0000, 0.0000, 0.6885]]]])

B. torch.nn.MaxPool2d

- (예시1) 연산방법, kernel_size 의 의미

img = torch.rand(1,1,4,4)
mp = torch.nn.MaxPool2d(kernel_size=2)
img
tensor([[[[0.8921, 0.4222, 0.5778, 0.2707],
          [0.6921, 0.5627, 0.5356, 0.1048],
          [0.5356, 0.7699, 0.9047, 0.5911],
          [0.3617, 0.5345, 0.1218, 0.4772]]]])
mp(img)
tensor([[[[0.8921, 0.5778],
          [0.7699, 0.9047]]]])

- (예시2) 이미지 크기와 딱 맞지않는 커널일 경우?

img = torch.rand(1,1,5,5)
mp = torch.nn.MaxPool2d(kernel_size=3)
img
tensor([[[[0.9560, 0.4947, 0.1591, 0.2606, 0.9130],
          [0.0603, 0.1255, 0.6520, 0.2504, 0.8759],
          [0.7544, 0.5927, 0.5319, 0.2390, 0.2883],
          [0.9470, 0.8519, 0.3501, 0.0725, 0.3881],
          [0.7203, 0.0753, 0.8360, 0.1287, 0.9515]]]])
mp(img)
tensor([[[[0.9560]]]])

- (예시3) 정사각형이 아닌 커널

img = torch.rand(1,1,4,4)
mp = torch.nn.MaxPool2d(kernel_size=(4,2))
img
tensor([[[[0.4283, 0.9998, 0.3532, 0.3085],
          [0.3278, 0.8575, 0.3331, 0.9769],
          [0.0239, 0.2457, 0.8468, 0.8224],
          [0.9593, 0.1292, 0.5930, 0.3652]]]])
mp(img)
tensor([[[[0.9998, 0.9769]]]])

C. torch.nn.Conv2d

-(예시1) 연산방법, stride=2

img = torch.rand(1,1,4,4)
conv = torch.nn.Conv2d(in_channels=1,out_channels=1,kernel_size=2,stride=2)
img
tensor([[[[0.7679, 0.3459, 0.6509, 0.7905],
          [0.1166, 0.8762, 0.9373, 0.8573],
          [0.5778, 0.8702, 0.9686, 0.5854],
          [0.1373, 0.3530, 0.0529, 0.0139]]]])
conv(img)
tensor([[[[ 0.1106, -0.1898],
          [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>)

- 과정

conv.weight.data, conv.bias.data
(tensor([[[[-0.0218,  0.2400],
           [-0.4914,  0.3394]]]]),
 tensor([-0.1958]))
(img[:,  :,  :2,  :2] * conv.weight.data).sum()+conv.bias.data, conv(img)
(tensor([0.1106]),
 tensor([[[[ 0.1106, -0.1898],
           [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>))
(img[:,  :,  :2,  2:] * conv.weight.data).sum()+conv.bias.data, conv(img)
(tensor([-0.1898]),
 tensor([[[[ 0.1106, -0.1898],
           [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>))
(img[:,  :,  2:,  :2] * conv.weight.data).sum()+conv.bias.data, conv(img)
(tensor([0.0529]),
 tensor([[[[ 0.1106, -0.1898],
           [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>))
(img[:,  :,  2:,  2:] * conv.weight.data).sum()+conv.bias.data, conv(img)
(tensor([-0.0976]),
 tensor([[[[ 0.1106, -0.1898],
           [ 0.0529, -0.0976]]]], grad_fn=<ConvolutionBackward0>))